{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploying and Interacting with Campaigns <a class=\"anchor\" id=\"top\"></a>\n",
    "\n",
    "In this notebook, you will deploy and interact with campaigns in Amazon Personalize.\n",
    "\n",
    "1. [Introduction](#intro)\n",
    "1. [Create campaigns](#create)\n",
    "1. [Interact with campaigns](#interact)\n",
    "1. [Batch recommendations](#batch)\n",
    "1. [Wrap up](#wrapup)\n",
    "\n",
    "## Introduction <a class=\"anchor\" id=\"intro\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "At this point, you should have several solutions and at least one solution version for each. Once a solution version is created, it is possible to get recommendations from them, and to get a feel for their overall behavior.\n",
    "\n",
    "This notebook starts off by deploying each of the solution versions from the previous notebook into individual campaigns. Once they are active, there are resources for querying the recommendations, and helper functions to digest the output into something more human-readable. \n",
    "\n",
    "As you with your customer on Amazon Personalize, you can modify the helper functions to fit the structure of their data input files to keep the additional rendering working.\n",
    "\n",
    "To get started, once again, we need to import libraries, load values from previous notebooks, and load the SDK."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from time import sleep\n",
    "import json\n",
    "from datetime import datetime\n",
    "import uuid\n",
    "\n",
    "import boto3\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You've seen how to connect to the Personalize SDK in both of the previous notebooks, so go ahead and do the same here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !ACTION REQUIRED!\n",
    "# Connect to the Personalize SDK using boto3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Establish a connection to Personalize's event streaming\n",
    "personalize_events = boto3.client(service_name='personalize-events')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create campaigns <a class=\"anchor\" id=\"create\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "A campaign is a hosted solution version; an endpoint which you can query for recommendations. Pricing is set by estimating throughput capacity (requests from users for personalization per second). When deploying a campaign, you set a minimum throughput per second (TPS) value. This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead for larger demand. For this POC and demo, all minimum throughput thresholds are set to 1. For more information, see the [pricing page](https://aws.amazon.com/personalize/pricing/).\n",
    "\n",
    "Let's start deploying the campaigns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### HRNN\n",
    "\n",
    "Deploy a campaign for your HRNN solution version. It can take around 10 minutes to deploy a campaign. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create multiple campaigns. So we will set up the while loop for all of the campaigns further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hrnn_create_campaign_response = personalize.create_campaign(\n",
    "    name = \"personalize-poc-hrnn\",\n",
    "    solutionVersionArn = hrnn_solution_version_arn,\n",
    "    minProvisionedTPS = 1\n",
    ")\n",
    "\n",
    "hrnn_campaign_arn = hrnn_create_campaign_response['campaignArn']\n",
    "print(json.dumps(hrnn_create_campaign_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SIMS\n",
    "\n",
    "Deploy a campaign for your SIMS solution version. It can take around 10 minutes to deploy a campaign. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create multiple campaigns. So we will set up the while loop for all of the campaigns further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sims_create_campaign_response = personalize.create_campaign(\n",
    "    name = \"personalize-poc-SIMS\",\n",
    "    solutionVersionArn = sims_solution_version_arn,\n",
    "    minProvisionedTPS = 1\n",
    ")\n",
    "\n",
    "sims_campaign_arn = sims_create_campaign_response['campaignArn']\n",
    "print(json.dumps(sims_create_campaign_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Personalized Ranking\n",
    "\n",
    "Deploy a campaign for your personalized ranking solution version. It can take around 10 minutes to deploy a campaign. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create multiple campaigns. So we will set up the while loop for all of the campaigns further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console.\n",
    "\n",
    "Go ahead and write the code for deploying a campaign with your personalized ranking solution. Hint: You'll need to remember one of the variables you stored from the previous notebook. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !ACTION REQUIRED!\n",
    "# Write the code to deploy a campaign with your personalized ranking solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View campaign creation status\n",
    "\n",
    "As promised, how to view the status updates in the console:\n",
    "\n",
    "* In another browser tab you should already have the AWS Console up from opening this notebook instance. \n",
    "* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. \n",
    "* Click `View dataset groups`.\n",
    "* Click the name of your dataset group, most likely something with POC in the name.\n",
    "* Click `Campaigns`.\n",
    "* You will now see a list of all of the campaigns you created above, including a column with the status of the campaign. Once it is `Active`, your campaign is ready to be queried.\n",
    "\n",
    "Or simply run the cell below to keep track of the campaign creation status."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "in_progress_campaigns = [\n",
    "    hrnn_campaign_arn,\n",
    "    sims_campaign_arn,\n",
    "    rerank_campaign_arn\n",
    "]\n",
    "\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    for campaign_arn in in_progress_campaigns:\n",
    "        version_response = personalize.describe_campaign(\n",
    "            campaignArn = campaign_arn\n",
    "        )\n",
    "        status = version_response[\"campaign\"][\"status\"]\n",
    "        \n",
    "        if status == \"ACTIVE\":\n",
    "            print(\"Build succeeded for {}\".format(campaign_arn))\n",
    "            in_progress_campaigns.remove(campaign_arn)\n",
    "        elif status == \"CREATE FAILED\":\n",
    "            print(\"Build failed for {}\".format(campaign_arn))\n",
    "            in_progress_campaigns.remove(campaign_arn)\n",
    "    \n",
    "    if len(in_progress_campaigns) <= 0:\n",
    "        break\n",
    "    else:\n",
    "        print(\"At least one campaign build is still in progress\")\n",
    "        \n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interact with campaigns <a class=\"anchor\" id=\"interact\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "Now that all campaigns are deployed and active, we can start to get recommendations via an API call. Each of the campaigns is based on a different recipe, which behave in slightly different ways because they serve different use cases. We will cover each campaign in a different order than used in previous notebooks, in order to deal with the possible complexities in ascending order (i.e. simplest first).\n",
    "\n",
    "First, let's create a supporting function to help make sense of the results returned by a Personalize campaign. Personalize returns only an `item_id`. This is great for keeping data compact, but it means you need to query a database or lookup table to get a human-readable result for the notebooks. We will create a helper function to return a human-readable result from the LastFM dataset.\n",
    "\n",
    "Start by loading in the dataset which we can use for our lookup table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a dataframe for the items by reading in the correct source CSV\n",
    "items_df = pd.read_csv(data_dir + '/artists.dat', delimiter='\\t', index_col=0)\n",
    "\n",
    "# Render some sample data\n",
    "items_df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By defining the ID column as the index column it is trivial to return an artist by just querying the ID."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "item_id_example = 987\n",
    "artist = items_df.loc[item_id_example]['name']\n",
    "print(artist)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That isn't terrible, but it would get messy to repeat this everywhere in our code, so the function below will clean that up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_artist_by_id(artist_id, artist_df=items_df):\n",
    "    \"\"\"\n",
    "    This takes in an artist_id from Personalize so it will be a string,\n",
    "    converts it to an int, and then does a lookup in a default or specified\n",
    "    dataframe.\n",
    "    \n",
    "    A really broad try/except clause was added in case anything goes wrong.\n",
    "    \n",
    "    Feel free to add more debugging or filtering here to improve results if\n",
    "    you hit an error.\n",
    "    \"\"\"\n",
    "    try:\n",
    "        return artist_df.loc[int(artist_id)]['name']\n",
    "    except:\n",
    "        return \"Error obtaining artist\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's test a few simple values to check our error catching. Why don't you go ahead and write some code to test it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !ACTION REQUIRED!\n",
    "# Call the above helper function a couple of times with different input values to test the error catching"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great! Now we have a way of rendering results. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SIMS\n",
    "\n",
    "SIMS requires just an item as input, and it will return items which users interact with in similar ways to their interaction with the input item. In this particular case the item is an artist. \n",
    "\n",
    "So, let's sample some data from our dataset to test our SIMS campaign. Grab 5 random artists from our dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "samples = items_df.sample(5)\n",
    "samples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cells below will handle getting recommendations from SIMS and rendering the results. Let's see what the recommendations are for the first item we looked at earlier in this notebook (Earth, Wind & Fire)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = sims_campaign_arn,\n",
    "    itemId = str(987),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "item_list = get_recommendations_response['itemList']\n",
    "for item in item_list:\n",
    "    print(get_artist_by_id(artist_id=item['itemId']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Congrats, this is your first list of recommendations! This list is fine, but it would be better to see the recommendations for our sample collection of artists render in a nice dataframe. Again, let's create a helper function to achieve this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Update DF rendering\n",
    "pd.set_option('display.max_rows', 30)\n",
    "\n",
    "def get_new_recommendations_df(recommendations_df, artist_ID):\n",
    "    # Get the artist name\n",
    "    artist_name = get_artist_by_id(artist_ID)\n",
    "    # Get the recommendations\n",
    "    get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = sims_campaign_arn,\n",
    "        itemId = str(artist_ID),\n",
    "    )\n",
    "    # Build a new dataframe of recommendations\n",
    "    item_list = get_recommendations_response['itemList']\n",
    "    recommendation_list = []\n",
    "    for item in item_list:\n",
    "        artist = get_artist_by_id(item['itemId'])\n",
    "        recommendation_list.append(artist)\n",
    "    new_rec_DF = pd.DataFrame(recommendation_list, columns = [artist_name])\n",
    "    # Add this dataframe to the old one\n",
    "    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's test the helper function with the sample of artists we chose before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sims_recommendations_df = pd.DataFrame()\n",
    "artists = samples.index.tolist()\n",
    "\n",
    "for artist in artists:\n",
    "    sims_recommendations_df = get_new_recommendations_df(sims_recommendations_df, artist)\n",
    "\n",
    "sims_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may notice that a lot of the items look the same, hopefully not all of them do. This shows that the evaluation metrics should not be the only thing you rely on when evaluating your solution version. So when this happens, what can you do to improve the results?\n",
    "\n",
    "This is a good time to think about the hyperparameters of the Personalize recipes. The SIMS recipe has a `popularity_discount_factor` hyperparameter (see [documentation](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-sims.html)). Leveraging this hyperparameter allows you to control the nuance you see in the results. This parameter and its behavior will be unique to every dataset you encounter, and depends on the goals of the business. You can iterate on the value of this hyperparameter until you are satisfied with the results, or you can start by leveraging Personalize's hyperparameter optimization (HPO) feature. For more information on hyperparameters and HPO tuning, see the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### HRNN\n",
    "\n",
    "HRNN is one of the more advanced algorithms provided by Amazon Personalize. It supports personalization of the items for a specific user based on their past behavior and can intake real time events in order to alter recommendations for a user without retraining. \n",
    "\n",
    "Since HRNN relies on having a sampling of users, let's load the data we need for that and select 3 random users."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "users_df = pd.read_csv(data_dir + '/user_artists.dat', delimiter='\\t', index_col=0)\n",
    "# Render some sample data\n",
    "users_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "users = users_df.sample(3).index.tolist()\n",
    "users"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we render the recommendations for our 3 random users from above. After that, we will explore real-time interactions before moving on to Personalized Ranking.\n",
    "\n",
    "Again, we create a helper function to render the results in a nice dataframe.\n",
    "\n",
    "#### API call results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Update DF rendering\n",
    "pd.set_option('display.max_rows', 30)\n",
    "\n",
    "def get_new_recommendations_df_users(recommendations_df, user_id):\n",
    "    # Get the artist name\n",
    "    #artist_name = get_artist_by_id(artist_ID)\n",
    "    # Get the recommendations\n",
    "    get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = hrnn_campaign_arn,\n",
    "        userId = str(user_id),\n",
    "    )\n",
    "    # Build a new dataframe of recommendations\n",
    "    item_list = get_recommendations_response['itemList']\n",
    "    recommendation_list = []\n",
    "    for item in item_list:\n",
    "        artist = get_artist_by_id(item['itemId'])\n",
    "        recommendation_list.append(artist)\n",
    "    #print(recommendation_list)\n",
    "    new_rec_DF = pd.DataFrame(recommendation_list, columns = [user_id])\n",
    "    # Add this dataframe to the old one\n",
    "    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recommendations_df_users = pd.DataFrame()\n",
    "users = users_df.sample(3).index.tolist()\n",
    "print(users)\n",
    "\n",
    "for user in users:\n",
    "    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)\n",
    "\n",
    "recommendations_df_users"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we clearly see that the recommendations for each user are different. If you were to need a cache for these results, you could start by running the API calls through all your users and store the results, or you could use a batch export, which will be covered later in this notebook.\n",
    "\n",
    "The next topic is real-time events. Personalize has the ability to listen to events from your application in order to update the recommendations shown to the user. This is especially useful in media workloads, like video-on-demand, where a customer's intent may differ based on if they are watching with their children or on their own.\n",
    "\n",
    "Additionally the events that are recorded via this system are stored until a delete call from you is issued, and they are used as historical data alongside the other interaction data you provided when you train your next models.\n",
    "\n",
    "#### Real time events\n",
    "\n",
    "Start by creating an event tracker that is attached to the campaign."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = personalize.create_event_tracker(\n",
    "    name='ArtistTracker',\n",
    "    datasetGroupArn=dataset_group_arn\n",
    ")\n",
    "print(response['eventTrackerArn'])\n",
    "print(response['trackingId'])\n",
    "TRACKING_ID = response['trackingId']\n",
    "event_tracker_arn = response['eventTrackerArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will create some code that simulates a user interacting with a particular item. After running this code, you will get recommendations that differ from the results above.\n",
    "\n",
    "We start by creating some methods for the simulation of real time events."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session_dict = {}\n",
    "\n",
    "def send_artist_click(USER_ID, ITEM_ID):\n",
    "    \"\"\"\n",
    "    Simulates a click as an envent\n",
    "    to send an event to Amazon Personalize's Event Tracker\n",
    "    \"\"\"\n",
    "    # Configure Session\n",
    "    try:\n",
    "        session_ID = session_dict[str(USER_ID)]\n",
    "    except:\n",
    "        session_dict[str(USER_ID)] = str(uuid.uuid1())\n",
    "        session_ID = session_dict[str(USER_ID)]\n",
    "        \n",
    "    # Configure Properties:\n",
    "    event = {\n",
    "    \"itemId\": str(ITEM_ID),\n",
    "    }\n",
    "    event_json = json.dumps(event)\n",
    "        \n",
    "    # Make Call\n",
    "    personalize_events.put_events(\n",
    "    trackingId = TRACKING_ID,\n",
    "    userId= str(USER_ID),\n",
    "    sessionId = session_ID,\n",
    "    eventList = [{\n",
    "        'sentAt': int(time.time()),\n",
    "        'eventType': 'EVENT_TYPE',\n",
    "        'properties': event_json\n",
    "        }]\n",
    "    )\n",
    "\n",
    "def get_new_recommendations_df_users_real_time(recommendations_df, user_id, item_id):\n",
    "    # Get the artist name (header of column)\n",
    "    artist_name = get_artist_by_id(item_id)\n",
    "    # Interact with the artist\n",
    "    send_artist_click(USER_ID=user_id, ITEM_ID=item_id)\n",
    "    # Get the recommendations (note you should have a base recommendation DF created before)\n",
    "    get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = hrnn_campaign_arn,\n",
    "        userId = str(user_id),\n",
    "    )\n",
    "    # Build a new dataframe of recommendations\n",
    "    item_list = get_recommendations_response['itemList']\n",
    "    recommendation_list = []\n",
    "    for item in item_list:\n",
    "        artist = get_artist_by_id(item['itemId'])\n",
    "        recommendation_list.append(artist)\n",
    "    new_rec_DF = pd.DataFrame(recommendation_list, columns = [artist_name])\n",
    "    # Add this dataframe to the old one\n",
    "    #recommendations_df = recommendations_df.join(new_rec_DF)\n",
    "    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point, we haven't generated any real-time events yet; we have only set up the code. To compare the recommendations before and after the real-time events, let's pick one user and generate the original recommendations for them.\n",
    "\n",
    "We've removed the code which makes the API call to the HRNN campaign from the cell below. Can you reproduce it?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First pick a user\n",
    "user_id = users_df.sample(1).index.tolist()[0]\n",
    "\n",
    "#-------------------------\n",
    "# !ACTION REQUIRED!\n",
    "# Write some code in this spot to make a call to the HRNN campaign\n",
    "#-------------------------\n",
    "\n",
    "# Build a new dataframe for the recommendations\n",
    "item_list = get_recommendations_response['itemList']\n",
    "recommendation_list = []\n",
    "for item in item_list:\n",
    "    artist = get_artist_by_id(item['itemId'])\n",
    "    recommendation_list.append(artist)\n",
    "user_recommendations_df = pd.DataFrame(recommendation_list, columns = [user_id])\n",
    "user_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, so now we have a list of recommendations for this user before we have applied any real-time events. Now let's pick 3 random artists which we will simulate our user interacting with, and then see how this changes the recommendations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Next generate 3 random artists\n",
    "artists = items_df.sample(3).index.tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note this will take about 15 seconds to complete due to the sleeps\n",
    "for artist in artists:\n",
    "    user_recommendations_df = get_new_recommendations_df_users_real_time(user_recommendations_df, user_id, artist)\n",
    "    time.sleep(5)\n",
    "user_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the cell above, the first column after the index is the user's default recommendations from HRNN, and each column after that has a header of the artist that they interacted with via a real time event, and the recommendations after this event occurred. \n",
    "\n",
    "The behavior may not shift very much; this is due to the relatively limited nature of this dataset. If you wanted to better understand this, try simulating clicking random artists of random genres, and you should see a more pronounced impact."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Personalized Ranking\n",
    "\n",
    "The core use case for personalized ranking is to take a collection of items and to render them in priority or probable order of interest for a user. To demonstrate this, we will need a random user and a random collection of 25 items."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rerank_user = users_df.sample(1).index.tolist()[0]\n",
    "rerank_items = items_df.sample(25).index.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now build a nice dataframe that shows the input data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rerank_list = []\n",
    "for item in rerank_items:\n",
    "    artist = get_artist_by_id(item)\n",
    "    rerank_list.append(artist)\n",
    "rerank_df = pd.DataFrame(rerank_list, columns = [rerank_user])\n",
    "rerank_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then make the personalized ranking API call. Again, we've removed the piece of the code that makes the actual API call. Can you reproduce it?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert user to string:\n",
    "user_id = str(rerank_user)\n",
    "rerank_item_list = []\n",
    "for item in rerank_items:\n",
    "    rerank_item_list.append(str(item))\n",
    "    \n",
    "#-------------------------\n",
    "# !ACTION REQUIRED!\n",
    "# Write some code in this spot to make a call to the personalized ranking campaign\n",
    "#-------------------------\n",
    "\n",
    "get_recommendations_response_rerank"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now add the reranked items as a second column to the original dataframe, for a side-by-side comparison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ranked_list = []\n",
    "item_list = get_recommendations_response_rerank['personalizedRanking']\n",
    "for item in item_list:\n",
    "    artist = get_artist_by_id(item['itemId'])\n",
    "    ranked_list.append(artist)\n",
    "ranked_df = pd.DataFrame(ranked_list, columns = ['Re-Ranked'])\n",
    "rerank_df = pd.concat([rerank_df, ranked_df], axis=1)\n",
    "rerank_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see above how each entry was re-ordered based on the model's understanding of the user. This is a popular task when you have a collection of items to surface a user, a list of promotions for example, or if you are filtering on a category and want to show the most likely good items."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Batch recommendations <a class=\"anchor\" id=\"batch\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "There are many cases where you may want to have a larger dataset of exported recommendations. Recently, Amazon Personalize launched batch recommendations as a way to export a collection of recommendations to S3. In this example, we will walk through how to do this for the HRNN solution. For more information about batch recommendations, please see the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/getting-recommendations.html#recommendations-batch). This feature applies to all recipes, but the output format will vary.\n",
    "\n",
    "A simple implementation looks like this:\n",
    "\n",
    "```python\n",
    "import boto3\n",
    "\n",
    "personalize_rec = boto3.client(service_name='personalize')\n",
    "\n",
    "personalize_rec.create_batch_inference_job (\n",
    "    solutionVersionArn = \"Solution version ARN\",\n",
    "    jobName = \"Batch job name\",\n",
    "    roleArn = \"IAM role ARN\",\n",
    "    jobInput = \n",
    "       {\"s3DataSource\": {\"path\": S3 input path}},\n",
    "    jobOutput = \n",
    "       {\"s3DataDestination\": {\"path\":S3 output path\"}}\n",
    ")\n",
    "```\n",
    "\n",
    "The SDK import, the solution version arn, and role arns have all been determined. This just leaves an input, an output, and a job name to be defined.\n",
    "\n",
    "Starting with the input for HRNN, it looks like:\n",
    "\n",
    "\n",
    "```JSON\n",
    "{\"userId\": \"4638\"}\n",
    "{\"userId\": \"663\"}\n",
    "{\"userId\": \"3384\"}\n",
    "```\n",
    "\n",
    "This should yield an output that looks like this:\n",
    "\n",
    "```JSON\n",
    "{\"input\":{\"userId\":\"4638\"}, \"output\": {\"recommendedItems\": [\"296\", \"1\", \"260\", \"318\"]}}\n",
    "{\"input\":{\"userId\":\"663\"}, \"output\": {\"recommendedItems\": [\"1393\", \"3793\", \"2701\", \"3826\"]}}\n",
    "{\"input\":{\"userId\":\"3384\"}, \"output\": {\"recommendedItems\": [\"8368\", \"5989\", \"40815\", \"48780\"]}}\n",
    "```\n",
    "\n",
    "The output is a JSON Lines file. It consists of individual JSON objects, one per line. So we will need to put in more work later to digest the results in this format."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building the input file\n",
    "\n",
    "When you are using the batch feature, you specify the users that you'd like to receive recommendations for when the job has completed. The cell below will again select a few random users and will then build the file and save it to disk. From there, you will upload it to S3 to use in the API call later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the user list\n",
    "batch_users = users_df.sample(3).index.tolist()\n",
    "\n",
    "# Write the file to disk\n",
    "json_input_filename = \"json_input.json\"\n",
    "with open(data_dir + \"/\" + json_input_filename, 'w') as json_input:\n",
    "    for user_id in batch_users:\n",
    "        json_input.write('{\"userId\": \"' + str(user_id) + '\"}\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Showcase the input file:\n",
    "!cat $data_dir\"/\"$json_input_filename"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Upload the file to S3 and save the path as a variable for later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Upload files to S3\n",
    "boto3.Session().resource('s3').Bucket(bucket_name).Object(json_input_filename).upload_file(data_dir+\"/\"+json_input_filename)\n",
    "s3_input_path = \"s3://\" + bucket_name + \"/\" + json_input_filename\n",
    "print(s3_input_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Batch recommendations read the input from the file we've uploaded to S3. Similarly, batch recommendations will save the output to file in S3. So we define the output path where the results should be saved."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the output path\n",
    "s3_output_path = \"s3://\" + bucket_name + \"/\"\n",
    "print(s3_output_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now just make the call to kick off the batch export process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "batchInferenceJobArn = personalize.create_batch_inference_job (\n",
    "    solutionVersionArn = hrnn_solution_version_arn,\n",
    "    jobName = \"POC-Batch-Inference-Job-HRNN\",\n",
    "    roleArn = role_arn,\n",
    "    jobInput = \n",
    "     {\"s3DataSource\": {\"path\": s3_input_path}},\n",
    "    jobOutput = \n",
    "     {\"s3DataDestination\":{\"path\": s3_output_path}}\n",
    ")\n",
    "batchInferenceJobArn = batchInferenceJobArn['batchInferenceJobArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the while loop below to track the status of the batch recommendation call. This can take around 25 minutes to complete, because Personalize needs to stand up the infrastructure to perform the task. We are testing the feature with a dataset of only 3 users, which is not an efficient use of this mechanism. Normally, you would only use this feature for bulk processing, in which case the efficiencies will become clear."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "current_time = datetime.now()\n",
    "print(\"Import Started on: \", current_time.strftime(\"%I:%M:%S %p\"))\n",
    "\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_inference_job_response = personalize.describe_batch_inference_job(\n",
    "        batchInferenceJobArn = batchInferenceJobArn\n",
    "    )\n",
    "    status = describe_dataset_inference_job_response[\"batchInferenceJob\"]['status']\n",
    "    print(\"DatasetInferenceJob: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)\n",
    "    \n",
    "current_time = datetime.now()\n",
    "print(\"Import Completed on: \", current_time.strftime(\"%I:%M:%S %p\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the batch recommendations job has finished processing, we can grab the output uploaded to S3 and parse it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.client('s3')\n",
    "export_name = json_input_filename + \".out\"\n",
    "s3.download_file(bucket_name, export_name, data_dir+\"/\"+export_name)\n",
    "\n",
    "# Update DF rendering\n",
    "pd.set_option('display.max_rows', 30)\n",
    "with open(data_dir+\"/\"+export_name) as json_file:\n",
    "    # Get the first line and parse it\n",
    "    line = json.loads(json_file.readline())\n",
    "    # Do the same for the other lines\n",
    "    while line:\n",
    "        # extract the user ID \n",
    "        col_header = \"User: \" + line['input']['userId']\n",
    "        # Create a list for all the artists\n",
    "        recommendation_list = []\n",
    "        # Add all the entries\n",
    "        for item in line['output']['recommendedItems']:\n",
    "            artist = get_artist_by_id(item)\n",
    "            recommendation_list.append(artist)\n",
    "        if 'bulk_recommendations_df' in locals():\n",
    "            new_rec_DF = pd.DataFrame(recommendation_list, columns = [col_header])\n",
    "            bulk_recommendations_df = bulk_recommendations_df.join(new_rec_DF)\n",
    "        else:\n",
    "            bulk_recommendations_df = pd.DataFrame(recommendation_list, columns=[col_header])\n",
    "        try:\n",
    "            line = json.loads(json_file.readline())\n",
    "        except:\n",
    "            line = None\n",
    "bulk_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrap up <a class=\"anchor\" id=\"wrapup\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "With that you now have a fully working collection of models to tackle various recommendation and personalization scenarios, as well as the skills to manipulate customer data to better integrate with the service, and a knowledge of how to do all this over APIs and by leveraging open source data science tools.\n",
    "\n",
    "Use these notebooks as a guide to getting started with your customers for POCs. As you find missing components, or discover new approaches, cut a pull request and provide any additional helpful components that may be missing from this collection.\n",
    "\n",
    "You'll want to make sure that you clean up all of the resources deployed during this POC. We have provided a separate notebook which shows you how to identify and delete the resources in `04_Clean_Up_Resources.ipynb`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}